Motivation

Individuals of varying (and possibly suspect) taste were asked to recommend and rank the watchability of movies from 1 (hard pass) to 10 (must watch now). Also I’m bored as shit, so let’s do this…

IMDB Metascore Ratings

imdb_df <- 
  movie_v %>% 
  stringr::str_remove("\\s\\(\\d*\\)") %>% 
  map_dfr(imdbMovies) %>% 
  as_tibble()

imdb_df %>% 
  mutate(
    imdbVotes = parse_number(imdbVotes),
    Metascore = parse_number(Metascore),
    imdbRating = parse_number(imdbRating)
  ) %>% 
  ggplot(
    aes(
      x = fct_reorder(Title, imdbRating, .desc = TRUE),
      y = imdbRating
    )
  ) +
  # geom_col()
  geom_col_pattern(
    aes(
      pattern_filename = I(Posterlink)),
    pattern_gravity = "North",
    pattern         = "image",
    pattern_scale   = 1, 
    pattern_type    = 'fit',
    fill            = "#69b3a2",
    colour          = '#69b3a2'
  ) +
  dark_theme_gray(base_size = 16) +
  scale_x_discrete(
    guide = guide_axis(n.dodge = 2)
  ) +
  labs(x = NULL, 
       y = "IMDB Rating") + 
  theme(axis.title = element_text(colour = "#69b3a2"))

Now to see how close the QMC agrees…

Responses

movie_df %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                full_width = F)
Timestamp The Vast of Night (2019) Soylent Green (1973) Westworld (1973) See You Yesterday (2019) Gattaca (1997) Contact (1997)
1/9/2022 14:15:23 7 7 7 6 10 10
1/9/2022 13:24:00 9 7 8 8 6 1
1/9/2022 13:26:40 5 3 6 5 10 8

Movies

Overall Likeability

Which movies had the highest average score?

movie mean
Gattaca (1997) 8.666667
The Vast of Night (2019) 7.000000
Westworld (1973) 7.000000
Contact (1997) 6.333333
See You Yesterday (2019) 6.333333
Soylent Green (1973) 5.666667

Gattaca (1997) appears to be universally appreciated, while Soylent Green (1973) is clearly foul.

Other Voting Metrics

vote <- movie_df %>% 
  mutate(
    across(
      where(is.numeric),
      ~flip_convert(.x)
    )
  ) %>% 
  as.data.frame() %>% 
  column_to_rownames(var = "Timestamp") %>% 
  as.matrix() %>% 
  create_vote(xtype = 1)

approval <- approval_method(vote, n = 3)
borda <- borda_method(vote)
copeland <- cdc_copeland(vote)
dodgson <- cdc_dodgson(vote, dq_t = "t")

borda_df <- borda$other_info$count_min %>% 
  data.frame() %>% 
  rownames_to_column("Movie") %>% 
  rename(Rank = 2) %>% 
  arrange(Rank)

copeland_df <- copeland$other_info$copeland_score %>% 
  data.frame() %>% 
  rownames_to_column("Movie") %>% 
  rename(Rank = 2) %>% 
  arrange(desc(Rank))

dodgson_df <- dodgson$other_info$tideman %>% 
  data.frame() %>% 
  rownames_to_column("Movie") %>% 
  rename(Rank = 2) %>% 
  arrange(Rank)

Borda Method

A voter’s 1st choice gets 1 point, the 2nd choice gets 2 points… The movie with the smallest total score wins. The function does not require voters to assign scores to all Movies, for NAs are automatically assigned the highest (worst) score. Duplicated values (two or more movies share the same score) are also allowed.

Borda Winner: Gattaca (1997)

borda_df %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                full_width = F)
Movie Rank
Gattaca (1997) 7
The Vast of Night (2019) 12
Westworld (1973) 12
See You Yesterday (2019) 14
Contact (1997) 14
Soylent Green (1973) 16

Copeland

Movies enter into pairwise comparison. If the number of voters who prefer Movie A is larger than the number of voters who prefer Movie B , then Movie A wins Movie B, Movie A gets 1 point, and Movie B gets -1 point. If the numbers are equal, then both of them gets 0 point. Then, sum up each one’s comparison points. For example, Movie A wins 3 times, loses 1 time, has equal votes with 2 other movies, Movie A’s score is 3 * 1 + (-1) * 1 + 0 * 2 = 2. The movie that gets the most points wins.

Copeland Winner: Gattaca (1997)

copeland_df %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                full_width = F)
Movie Rank
Gattaca (1997) 5
Contact (1997) 3
The Vast of Night (2019) 0
Westworld (1973) 0
See You Yesterday (2019) -3
Soylent Green (1973) -5

Dodgson (Tideman Score) Method

Suppose the candidates are Paul Blart: Mall Cop 2, Casablanca, Howard the Duck and Citizen Kane. If Paul Blart: Mall Cop 2 wins Casablanca in pairwise comparison or has equal votes with Casablanca, then add 0 to Paul Blart: Mall Cop 2. If Howard the Duck wins Paul Blart: Mall Cop 2, then add to Paul Blart: Mall Cop 2 adv(Howard the Duck, Paul Blart: Mall Cop 2), that is, the number of voters that prefer Howard the Duck than Paul Blart: Mall Cop 2, minus the number of voters that prefer Paul Blart: Mall Cop 2 than Howard the Duck. Again, if Citizen Kane wins Paul Blart: Mall Cop 2, then add to Paul Blart: Mall Cop 2 that number. Then, we sum up the points that belong to Paul Blart: Mall Cop 2. We do the same thing to Casablanca, Howard the Duck and Citizen Kane. The one gets the least points is the winner.

Dodgson Winner: Gattaca (1997)

dodgson_df %>% 
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                full_width = F)
Movie Rank
Gattaca (1997) 0
The Vast of Night (2019) 2
Westworld (1973) 2
Contact (1997) 2
See You Yesterday (2019) 6
Soylent Green (1973) 7

Visualization

Boxplot

movie_long_df %>% 
  mutate(movie = factor(movie, levels = rev(movie_long_mean$movie))) %>% 
  ggplot(
    aes(
      x = movie, 
      y = vote, 
      color = movie,
      fill = movie
    )
  ) +
  geom_boxplot(alpha = 0.3) +
  geom_jitter(width = 0.1,
              height = 0.1) +
  scale_color_viridis_d() + 
  scale_fill_viridis_d() + 
  dark_theme_gray(base_size = 16) +
  labs(x = "Movie", 
       y = "Rank") +
  coord_flip() +
  guides(fill = "none",
         color = "none")

Ridgeline Plot

movie_long_df %>% 
  mutate(movie = factor(movie, levels = rev(movie_long_mean$movie))) %>% 
  ggplot(aes(y = movie, x = vote, 
             color = movie,
             fill = movie)) +
  geom_density_ridges(bandwidth = 0.5, alpha = 0.2) +
  theme_ridges() +
  dark_theme_gray(base_size = 16) +
  labs(x = "Rank", 
       y = "Movie") +
  guides(fill = "none",
         color = "none") +
  scale_color_viridis_d() + 
  scale_fill_viridis_d() + 
  scale_x_continuous(
    expand = c(0.01, 0), 
    breaks = 1:length(unique(movie_long_df$movie))
  )

Density Curve

Controversial

What about the highest variance?

movie stan_dev
Contact (1997) 4.725816
Gattaca (1997) 2.309401
Soylent Green (1973) 2.309401
The Vast of Night (2019) 2.000000
See You Yesterday (2019) 1.527525
Westworld (1973) 1.000000

Watchers

Who are the film critics, whom all others follow? Who are the freaks?

We define 2 measures of watching preference, mean absolute difference and the sum of squared differences.

Mean Absolute Difference

We take the average distance of each cinema-goer’s rank for a given movie from that movie’s sample mean.

Timestamp mean_abs_diff
1/9/2022 14:15:23 1.111111
1/9/2022 13:26:40 1.666667
1/9/2022 13:24:00 2.333333

On down the list, you have the lesser movie critics, with indisputably irrational preferences. Timestamps are provided to keep responses anonymous, but you know who you are…

Sum of Squared Differences

According to some, the sum of squared differences is a demonstrably poorer metric than mean absolute deviation, but I indulge the reader.

Timestamp sum_sqrd_diff
1/9/2022 14:15:23 17.11111
1/9/2022 13:26:40 18.44444
1/9/2022 13:24:00 45.11111